[offload] Add support for fp16 training #374

anj-s · 2021-02-09T20:41:58Z

Before submitting

[ X] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
[X ] Did you read the contributor guideline?
[- ] Did you make sure to update the docs?
[ X] Did you write any new necessary tests?

What does this PR do?

Add support for fp16 training with offload. I see a 16% increase in speed and 25% decrease in memory when compared to fp32. Note: This is a simple MLP model that I am using.
GradScaler does not work with offload since the grads are moved to the CPU and GradScaler only supports cuda ops. Added this to the list of TODO features to work on.
Added a unit test and support in benchmarks to run this.

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

anj-s · 2021-02-09T22:00:51Z

Previous versions don't support amp I think. Fixing failures

blefaudeux · 2021-02-10T00:54:52Z

benchmarks/offload.py

-            logging.info(f"Memory table {prof.key_averages().table()}")
-            logging.info("Memory stats are " + str(torch.cuda.memory_stats(0)["allocated_bytes.all.peak"] / 2 ** 30))
+            logging.info(
+                "Memory stats are {:.2f}GB".format(torch.cuda.memory_stats(0)["allocated_bytes.all.peak"] / 2 ** 30)


dummy thought earlier in the day: it would be great to add some ballpark computation of the expected size at some point (given the batch size + model/shards), just for comparison

blefaudeux · 2021-02-10T00:55:18Z

fairscale/nn/misc/offload.py

@@ -148,6 +150,7 @@ def forward(ctx: Any, inputs: Any, index: int, model_slices: Any, model_instance
        return inputs if isinstance(inputs, tuple) else (inputs,)

    @staticmethod
+    @custom_bwd


could you elaborate on that ? it's new to me
edit: I looked it up, sorry for the noise, makes sense

blefaudeux

nice, LGTM, thanks Anjali !

min-xu-ai

nice. is there doc changes needed or updating the change log file?

anj-s · 2021-02-11T10:24:16Z

nice. is there doc changes needed or updating the change log file?

I am committing to a feature branch which I will merge to master(very soon I think). I'll make sure to add doc changes and change log additions as needed.

…s on 1 GPU. (#432) * clean start * removing per layer split strategy, probably not that useful indeed * initial transformer benchmark * hack, enable testing ViT + offload, python3 benchmarks/oss.py --epochs 2 --optim_type oss_offload_ddp --batch_size=32 --model vit_large_patch16_224 * proper cuda streams and device, something off in terms of mems consumption * minor, stashing * unit test fix * removing all the distributed parts * simpler test, needs debugging * working OOP, running a model which does not fit on the gpu memory * spring cleaning * removing the ill-advised optimizer bits, better keep that orthogonal * [offload] Add support for activation offloading + other changes (#367) * initial fwd/bwd commit * checkpoint work * modify shard loop * activation offloading and test to start with * fix lint errors * update comments * fix lint * remove unused var * remove commented out lines * modify name * remove break * remove profiler comments * avoid saving inputs * fix lint errors Co-authored-by: Anjali Sridhar <anj@devfair0443.h2.fair> * [offload] Add support for fp16 training (#374) * initial fwd/bwd commit * checkpoint work * modify shard loop * activation offloading and test to start with * fix lint errors * update comments * fix lint * remove unused var * remove commented out lines * modify name * remove break * remove profiler comments * add support for fp16 * add unit tests * fix lint errors * fix test failure Co-authored-by: Anjali Sridhar <anj@devfair0443.h2.fair> * [offload] Add support for activation checkpointing for all layers. (#381) * initial fwd/bwd commit * checkpoint work * modify shard loop * activation offloading and test to start with * fix lint errors * update comments * fix lint * remove unused var * remove commented out lines * modify name * remove break * remove profiler comments * add support for fp16 * add unit tests * fix lint errors * fix test failure * cp work, incorrect output dimensions still need to be fixed * fixed activation outputs * intermediate cp of work * add tests * fix lint errors Co-authored-by: Anjali Sridhar <anj@devfair0443.h2.fair> * add support for microbatches * revert benchmark config changes * add parametrization * fix lint errors and tests * skip test for 1.5 * fix lint errors * skip test if there are no GPUs * fix lint errors * fix lint errors * move experimental to the fairscale repo * lint error fixes * modify test imports * lint error fixes * move offload files to the experimental directory * move tests and benchmarks to their forlder * fix mypy errors * cp intermediate working benchmarks * more changes * split benchmark configs * remove print statements * fix lint errors * remove unused print * stress testing * remove unused file * change param nae * lint fixes * move file to the right folder * offload_experimental * add doc string * add error message Co-authored-by: Benjamin Lefaudeux <benjamin.lefaudeux@gmail.com> Co-authored-by: Benjamin Lefaudeux <benjamin.lefaudeux@protonmail.com> Co-authored-by: Anjali Sridhar <anj@devfair0443.h2.fair>

Anjali Sridhar added 13 commits February 5, 2021 14:01

initial fwd/bwd commit

89451bf

checkpoint work

c569679

modify shard loop

4e357cd

activation offloading and test to start with

510de2d

fix lint errors

c8df2a6

update comments

6d53152

fix lint

8bee0cc

remove unused var

a825e96

remove commented out lines

29c8c34

modify name

a5d1c88

remove break

48e3e25

remove profiler comments

3befc5c

add support for fp16

e10a874

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 9, 2021

anj-s changed the title ~~[offload] Add support for fp16 training + tests + benchmark~~ [offload] Add support for fp16 training Feb 9, 2021

anj-s changed the base branch from master to offload_optimizer February 9, 2021 20:42

add unit tests

3c4e71d

anj-s requested a review from blefaudeux February 9, 2021 20:55

fix lint errors

901390f

blefaudeux reviewed Feb 10, 2021

View reviewed changes

blefaudeux approved these changes Feb 10, 2021

View reviewed changes

fix test failure

1df2619

min-xu-ai approved these changes Feb 11, 2021

View reviewed changes

Base automatically changed from offload_optimizer to offload_experimental February 12, 2021 19:14

fix merge conflicts

702b402

anj-s merged commit c2ac144 into offload_experimental Feb 12, 2021

anj-s deleted the offload_fp16 branch February 12, 2021 19:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[offload] Add support for fp16 training #374

[offload] Add support for fp16 training #374

anj-s commented Feb 9, 2021 •

edited

Loading

anj-s commented Feb 9, 2021

blefaudeux Feb 10, 2021

blefaudeux Feb 10, 2021 •

edited

Loading

blefaudeux left a comment

min-xu-ai left a comment

anj-s commented Feb 11, 2021

[offload] Add support for fp16 training #374

[offload] Add support for fp16 training #374

Conversation

anj-s commented Feb 9, 2021 • edited Loading

Before submitting

What does this PR do?

PR review

Did you have fun?

anj-s commented Feb 9, 2021

blefaudeux Feb 10, 2021

Choose a reason for hiding this comment

blefaudeux Feb 10, 2021 • edited Loading

Choose a reason for hiding this comment

blefaudeux left a comment

Choose a reason for hiding this comment

min-xu-ai left a comment

Choose a reason for hiding this comment

anj-s commented Feb 11, 2021

anj-s commented Feb 9, 2021 •

edited

Loading

blefaudeux Feb 10, 2021 •

edited

Loading